Introduction

This workflow displays a variety of interactive Plotly visualizations to explore the mouse melanoma data published in the recent Pozniak, et al 2022. Plotly visualizations found in this workflow are also found in the accompanying R Shiny App.

library("dplyr")
library("tibble")
library("ggplot2")
library("ggpubr")
library("GSVA")
library("readr")
library("heatmaply")
library("plotly")

Load data

The aggregated NRAS13_all_43k cell dataset and test gene set consisting of two gene files are loaded:

## read aggregated count human malignant dataset 
df1 <- readRDS(file = "./counts/NRAS13_all_43k_agg_counts.rds")
# df2 <- readRDS(file = "./counts/NRAS13_malign_preprint_agg_counts.rds")
# head(df1)

Gene lists

This workflow is primarily designed with the intention to correlate two gene sets. Here we load such a data file with two toy gene sets. It is possible to upload a custom user-generated gene set as long as the file is a CSV or TSV file and the gene sets are the first and second columns. This workflow ignores additional columns, and these columns may be named whatever the end user wishes.

## read gene sets of interest to assay in the dataset
g1 <- readr::read_csv("./counts/mouse_test_genes.csv")

## display gene lists as data table
g1 %>% 
  DT::datatable()

Create gene list

This block of code puts each row of the test gene set into a list with each element in the list given the same name as the original dataframe columns, however NA values are removed from the list elements.

gl <- list()
for (i in 1:ncol(g1)) {
  g1 %>% 
    dplyr::select(i) %>%
    dplyr::pull() -> gl[[i]]
  gl[[i]] <- gl[[i]][!is.na(gl[[i]])]
}
names(gl) <- names(g1)

Run GSVA

The app applies GSVA on the gene sets found in the test gene file. This is the most time consuming computational step in the app.

## run ssGSEA
GSVA::gsva(expr = as.matrix(df1 %>% dplyr::select(-gene) %>% as.data.frame()),
           gset.idx.list = gl) -> gsva_sig
## Estimating GSVA scores for 2 gene sets.
## Estimating ECDFs with Gaussian kernels
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%

Violin Plots

The dataframe resulting from the GSVA step is converted to a long dataframe and the information contained in the aggregated cell names is used to re-label the aggregated cell types by sample.

## convert data for violin Plot 
gsva_sig %>%
  as.data.frame() %>%
  tibble::rownames_to_column() %>%
  dplyr::rename(gene_sig = names(.)[1]) %>%
  tidyr::gather(key = "cell_name", value = "expression", -gene_sig) %>%
  dplyr::mutate(sample = gsub("^[^_]*.", "",
                              gsub("_[^_]+$", "", cell_name))) %>%
  dplyr::mutate(cell_type = dplyr::case_when(
    grepl("B-cell", cell_name) ~ "B-cell",
    grepl("CAF", cell_name) ~ "CAF",
    grepl("DC", cell_name) ~ "DC",
    grepl("EC", cell_name) ~ "EC",
    grepl("Malignant", cell_name) ~ "Malignant",
    grepl("Monocyte", cell_name) ~ "Monocyte/macrophage",
    grepl("Pericyte", cell_name) ~ "Pericyte",
    grepl("NKcell", cell_name) ~ "T/NKcell")) %>%
  as.data.frame() -> a1

Violin plots by gene set score

This plot displays gene expression by cell type for each gene signature. Due to UI space constraints in the app, we will probably be limited to displaying only the first two gene sets.

a1 %>%
  ggplot(aes(x = cell_type, y = expression, fill = cell_type)) +
  geom_violin(alpha = 0.8) +
  geom_point(position = position_jitter(seed = 1, width = 0.2),
             size = 0.4, alpha = 0.8) +
  theme_classic() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45, hjust = 1)) +
  ggtitle("Violin plot of gene signature scores by cell type") +
  facet_grid(~gene_sig) -> p1
ggplotly(p1)
# p1

Heatmaps

This block of code selects all of the genes found in the text file, extracts the counts for each sample and averages them for these genes. The gene counts have been transformed to log scale in order for highly expressed genes not skew the gradient scale and drown out subtle changes in expression.

df1 %>%
  dplyr::filter(gene %in% unlist(g1)[!is.na(unlist(g1))]) %>%
  tidyr::gather(key = "cell_name", value = "expression", -gene) %>%
  dplyr::mutate(sample = gsub("^[^_]*.", "",
                              gsub("_[^_]+$", "", cell_name))) %>%
  dplyr::mutate(cell_type = dplyr::case_when(
    grepl("B-cell", cell_name) ~ "B-cell",
    grepl("CAF", cell_name) ~ "CAF",
    grepl("DC", cell_name) ~ "DC",
    grepl("EC", cell_name) ~ "EC",
    grepl("Malignant", cell_name) ~ "Malignant",
    grepl("Monocyte", cell_name) ~ "Monocyte/macrophage",
    grepl("Pericyte", cell_name) ~ "Pericyte",
    grepl("NKcell", cell_name) ~ "T/NKcell")) %>%
  dplyr::group_by(gene, cell_type) %>%
  dplyr::summarise(mean_exp = mean(expression)) %>%
  tidyr::spread(key = "cell_type", value = "mean_exp") %>%
  tibble::column_to_rownames(var = "gene") %>%
  dplyr::mutate_all(log) %>%
  as.data.frame() -> r1
is.na(r1) <- sapply(r1, is.infinite)
r1[is.na(r1)] <- 0

heatmaply::heatmaply(
    x = r1,
    main = "Heatmap of log gene counts",
    xlab = "Cell types",
    ylab = "Genes of interest",
    scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(
      low = "blue",
      high = "firebrick"))

Correlation log gene expression heatmap

df1 %>%
  dplyr::filter(gene %in% unlist(g1)[!is.na(unlist(g1))]) %>%
  tidyr::gather(key = "cell_name", value = "expression", -gene) %>%
  dplyr::mutate(sample = gsub("^[^_]*.", "",
                              gsub("_[^_]+$", "", cell_name))) %>%
  dplyr::mutate(cell_type = dplyr::case_when(
    grepl("B-cell", cell_name) ~ "B-cell",
    grepl("CAF", cell_name) ~ "CAF",
    grepl("DC", cell_name) ~ "DC",
    grepl("EC", cell_name) ~ "EC",
    grepl("Malignant", cell_name) ~ "Malignant",
    grepl("Monocyte", cell_name) ~ "Monocyte/macrophage",
    grepl("Pericyte", cell_name) ~ "Pericyte",
    grepl("NKcell", cell_name) ~ "T/NKcell")) %>%
  dplyr::group_by(gene, cell_type) %>%
  dplyr::summarise(mean_exp = mean(expression)) %>%
  tidyr::spread(key = "cell_type", value = "mean_exp") %>%
  tibble::column_to_rownames(var = "gene") %>%
  dplyr::mutate_all(log) %>%
  as.data.frame() -> r2
is.na(r2) <- sapply(r2, is.infinite)
r2[is.na(r2)] <- 0

r2 %>%
  t() %>%
  cor() %>%
heatmaply::heatmaply(
    main = "Correlation Heatmap of log gene counts across samples",
    xlab = "Cell types by time point",
    ylab = "Genes of interest",
    scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(
      low = "blue",
      high = "firebrick"))

Correlation heatmap across samples

r2 %>%
  cor() %>%
heatmaply::heatmaply(
    main = "Correlation Heatmap of log gene counts across samples",
    xlab = "Cell types by time point",
    ylab = "Genes of interest",
    scale_fill_gradient_fun = ggplot2::scale_fill_gradient2(
      low = "blue",
      high = "firebrick"))

Correlation plots

The following correlation plots display how each gene set correlate each gene set with each other across cell types and time points. The correlation for each group is reported by displaying the R value and p-value.

gsva_sig %>%
  t() %>% as.data.frame() %>%
  tibble::rownames_to_column() %>%
  dplyr::rename(sample_name = names(.)[1]) %>%
  dplyr::mutate(sample = gsub("^[^_]*.", "",
                              gsub("_[^_]+$", "", sample_name))) %>%
  dplyr::mutate(cell_type = dplyr::case_when(
    grepl("B-cell", sample_name) ~ "B-cell",
    grepl("CAF", sample_name) ~ "CAF",
    grepl("DC", sample_name) ~ "DC",
    grepl("EC", sample_name) ~ "EC",
    grepl("Malignant", sample_name) ~ "Malignant",
    grepl("Monocyte", sample_name) ~ "Monocyte/macrophage",
    grepl("Pericyte", sample_name) ~ "Pericyte",
    grepl("NKcell", sample_name) ~ "T/NKcell")) %>%
  as.data.frame() -> d1

Correlation between signature scores combined across cell types and timepoints

d1 %>%
  dplyr::group_by(sample) %>%
  dplyr::summarise(mean_gene1 = mean(gene1),
                   mean_gene2 = mean(gene2)) %>%
  ggplot(aes(x = mean_gene1, y = mean_gene2)) +
  geom_point(size = 1, color = "black") +
  stat_smooth(method = "lm", se = TRUE, fill = "gray", color = "darkgray",
              formula = y ~ poly(x, 1, raw = TRUE)) +
  ggpubr::stat_cor(method = "spearman", label.x = 0) +
  theme_classic() +
  xlab("Signature Score 1") +
  ylab("Signature Score 2") +
  theme(axis.title = element_text(size = 14),
        legend.position = "none") +
  ggtitle(paste0("Correlation between signature scores across cell types")
          ) -> p0
ggplotly(p0)

Correlation between signature scores across cell types

d1 %>%
  dplyr::filter(!grepl("Patient", cell_type)) %>%
  ggplot(aes(x = gene1, y = gene2)) +
  geom_point(size = 1, aes(color = cell_type)) +
  stat_smooth(method = "lm", se = TRUE, fill = "gray", color = "darkgray",
              formula = y ~ poly(x, 1, raw = TRUE)) +
  ggpubr::stat_cor(method = "spearman", label.x = 0) +
  theme_classic() +
  xlab("Signature Score 1") +
  ylab("Signature Score 2") +
  theme(axis.title = element_text(size = 14),
        legend.position = "none") +
  ggtitle(paste0("Correlation between signature scores across cell types")) +
  facet_grid(cols = vars(cell_type)) -> p4
# p4
ggplotly(p4)
sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.2 (2022-10-31)
##  os       macOS Big Sur ... 10.16
##  system   x86_64, darwin17.0
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Brussels
##  date     2022-12-15
##  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package              * version   date (UTC) lib source
##  abind                  1.4-5     2016-07-21 [1] CRAN (R 4.2.0)
##  annotate               1.74.0    2022-04-26 [1] Bioconductor
##  AnnotationDbi          1.58.0    2022-04-26 [1] Bioconductor
##  assertthat             0.2.1     2019-03-21 [1] CRAN (R 4.2.0)
##  backports              1.4.1     2021-12-13 [1] CRAN (R 4.2.0)
##  beachmat               2.12.0    2022-04-26 [1] Bioconductor
##  Biobase                2.56.0    2022-04-26 [1] Bioconductor
##  BiocGenerics           0.42.0    2022-04-26 [1] Bioconductor
##  BiocParallel           1.30.4    2022-10-11 [1] Bioconductor
##  BiocSingular           1.12.0    2022-04-26 [1] Bioconductor
##  Biostrings             2.64.1    2022-08-18 [1] Bioconductor
##  bit                    4.0.5     2022-11-15 [1] CRAN (R 4.2.2)
##  bit64                  4.0.5     2020-08-30 [1] CRAN (R 4.2.0)
##  bitops                 1.0-7     2021-04-24 [1] CRAN (R 4.2.0)
##  blob                   1.2.3     2022-04-10 [1] CRAN (R 4.2.0)
##  broom                  1.0.1     2022-08-29 [1] CRAN (R 4.2.0)
##  bslib                  0.4.1     2022-11-02 [1] CRAN (R 4.2.2)
##  ca                     0.71.1    2020-01-24 [1] CRAN (R 4.2.0)
##  cachem                 1.0.6     2021-08-19 [1] CRAN (R 4.2.0)
##  car                    3.1-1     2022-10-19 [1] CRAN (R 4.2.1)
##  carData                3.0-5     2022-01-06 [1] CRAN (R 4.2.0)
##  cli                    3.4.1     2022-09-23 [1] CRAN (R 4.2.0)
##  codetools              0.2-18    2020-11-04 [1] CRAN (R 4.2.2)
##  colorspace             2.0-3     2022-02-21 [1] CRAN (R 4.2.0)
##  crayon                 1.5.2     2022-09-29 [1] CRAN (R 4.2.1)
##  crosstalk              1.2.0     2021-11-04 [1] CRAN (R 4.2.0)
##  data.table             1.14.6    2022-11-16 [1] CRAN (R 4.2.2)
##  DBI                    1.1.3     2022-06-18 [1] CRAN (R 4.2.0)
##  DelayedArray           0.22.0    2022-04-26 [1] Bioconductor
##  DelayedMatrixStats     1.18.2    2022-10-13 [1] Bioconductor
##  dendextend             1.16.0    2022-07-04 [1] CRAN (R 4.2.0)
##  digest                 0.6.31    2022-12-11 [1] CRAN (R 4.2.2)
##  dplyr                * 1.0.10    2022-09-01 [1] CRAN (R 4.2.0)
##  DT                     0.26      2022-10-19 [1] CRAN (R 4.2.1)
##  ellipsis               0.3.2     2021-04-29 [1] CRAN (R 4.2.0)
##  evaluate               0.19      2022-12-13 [1] CRAN (R 4.2.2)
##  fansi                  1.0.3     2022-03-24 [1] CRAN (R 4.2.0)
##  farver                 2.1.1     2022-07-06 [1] CRAN (R 4.2.0)
##  fastmap                1.1.0     2021-01-25 [1] CRAN (R 4.2.0)
##  foreach                1.5.2     2022-02-02 [1] CRAN (R 4.2.0)
##  generics               0.1.3     2022-07-05 [1] CRAN (R 4.2.0)
##  GenomeInfoDb           1.32.4    2022-09-06 [1] Bioconductor
##  GenomeInfoDbData       1.2.8     2022-04-25 [1] Bioconductor
##  GenomicRanges          1.48.0    2022-04-26 [1] Bioconductor
##  ggplot2              * 3.4.0     2022-11-04 [1] CRAN (R 4.2.2)
##  ggpubr               * 0.5.0     2022-11-16 [1] CRAN (R 4.2.2)
##  ggsignif               0.6.4     2022-10-13 [1] CRAN (R 4.2.1)
##  glue                   1.6.2     2022-02-24 [1] CRAN (R 4.2.0)
##  graph                  1.74.0    2022-04-26 [1] Bioconductor
##  gridExtra              2.3       2017-09-09 [1] CRAN (R 4.2.0)
##  GSEABase               1.58.0    2022-04-26 [1] Bioconductor
##  GSVA                 * 1.44.5    2022-09-22 [1] Bioconductor
##  gtable                 0.3.1     2022-09-01 [1] CRAN (R 4.2.0)
##  HDF5Array              1.24.2    2022-08-02 [1] Bioconductor
##  heatmaply            * 1.4.0     2022-10-08 [1] CRAN (R 4.2.1)
##  hms                    1.1.2     2022-08-19 [1] CRAN (R 4.2.0)
##  htmltools              0.5.4     2022-12-07 [1] CRAN (R 4.2.2)
##  htmlwidgets            1.5.4     2021-09-08 [1] CRAN (R 4.2.0)
##  httr                   1.4.4     2022-08-17 [1] CRAN (R 4.2.0)
##  IRanges                2.30.1    2022-08-18 [1] Bioconductor
##  irlba                  2.3.5.1   2022-10-03 [1] CRAN (R 4.2.0)
##  iterators              1.0.14    2022-02-05 [1] CRAN (R 4.2.0)
##  jquerylib              0.1.4     2021-04-26 [1] CRAN (R 4.2.0)
##  jsonlite               1.8.4     2022-12-06 [1] CRAN (R 4.2.2)
##  KEGGREST               1.36.3    2022-07-12 [1] Bioconductor
##  knitr                  1.41      2022-11-18 [1] CRAN (R 4.2.0)
##  labeling               0.4.2     2020-10-20 [1] CRAN (R 4.2.0)
##  lattice                0.20-45   2021-09-22 [1] CRAN (R 4.2.2)
##  lazyeval               0.2.2     2019-03-15 [1] CRAN (R 4.2.0)
##  lifecycle              1.0.3     2022-10-07 [1] CRAN (R 4.2.1)
##  magrittr               2.0.3     2022-03-30 [1] CRAN (R 4.2.0)
##  Matrix                 1.5-3     2022-11-11 [1] CRAN (R 4.2.0)
##  MatrixGenerics         1.8.1     2022-06-26 [1] Bioconductor
##  matrixStats            0.63.0    2022-11-18 [1] CRAN (R 4.2.0)
##  memoise                2.0.1     2021-11-26 [1] CRAN (R 4.2.0)
##  mgcv                   1.8-41    2022-10-21 [1] CRAN (R 4.2.2)
##  munsell                0.5.0     2018-06-12 [1] CRAN (R 4.2.0)
##  nlme                   3.1-160   2022-10-10 [1] CRAN (R 4.2.2)
##  pillar                 1.8.1     2022-08-19 [1] CRAN (R 4.2.0)
##  pkgconfig              2.0.3     2019-09-22 [1] CRAN (R 4.2.0)
##  plotly               * 4.10.1    2022-11-07 [1] CRAN (R 4.2.2)
##  plyr                   1.8.8     2022-11-11 [1] CRAN (R 4.2.0)
##  png                    0.1-8     2022-11-29 [1] CRAN (R 4.2.2)
##  purrr                  0.3.5     2022-10-06 [1] CRAN (R 4.2.1)
##  R6                     2.5.1     2021-08-19 [1] CRAN (R 4.2.0)
##  RColorBrewer           1.1-3     2022-04-03 [1] CRAN (R 4.2.0)
##  Rcpp                   1.0.9     2022-07-08 [1] CRAN (R 4.2.0)
##  RCurl                  1.98-1.9  2022-10-03 [1] CRAN (R 4.2.1)
##  readr                * 2.1.3     2022-10-01 [1] CRAN (R 4.2.1)
##  registry               0.5-1     2019-03-05 [1] CRAN (R 4.2.0)
##  reshape2               1.4.4     2020-04-09 [1] CRAN (R 4.2.0)
##  rhdf5                  2.40.0    2022-04-26 [1] Bioconductor
##  rhdf5filters           1.8.0     2022-04-26 [1] Bioconductor
##  Rhdf5lib               1.18.2    2022-05-15 [1] Bioconductor
##  rlang                  1.0.6     2022-09-24 [1] CRAN (R 4.2.0)
##  rmarkdown              2.18      2022-11-09 [1] CRAN (R 4.2.0)
##  RSQLite                2.2.19    2022-11-24 [1] CRAN (R 4.2.2)
##  rstatix                0.7.1     2022-11-09 [1] CRAN (R 4.2.0)
##  rstudioapi             0.14      2022-08-22 [1] CRAN (R 4.2.0)
##  rsvd                   1.0.5     2021-04-16 [1] CRAN (R 4.2.0)
##  S4Vectors              0.34.0    2022-04-26 [1] Bioconductor
##  sass                   0.4.4     2022-11-24 [1] CRAN (R 4.2.0)
##  ScaledMatrix           1.4.1     2022-09-11 [1] Bioconductor
##  scales                 1.2.1     2022-08-20 [1] CRAN (R 4.2.0)
##  seriation              1.4.0     2022-10-21 [1] CRAN (R 4.2.0)
##  sessioninfo            1.2.2     2021-12-06 [1] CRAN (R 4.2.0)
##  SingleCellExperiment   1.18.1    2022-10-02 [1] Bioconductor
##  sparseMatrixStats      1.8.0     2022-04-26 [1] Bioconductor
##  stringi                1.7.8     2022-07-11 [1] CRAN (R 4.2.0)
##  stringr                1.5.0     2022-12-02 [1] CRAN (R 4.2.2)
##  SummarizedExperiment   1.26.1    2022-05-01 [1] Bioconductor
##  tibble               * 3.1.8     2022-07-22 [1] CRAN (R 4.2.0)
##  tidyr                  1.2.1     2022-09-08 [1] CRAN (R 4.2.0)
##  tidyselect             1.2.0     2022-10-10 [1] CRAN (R 4.2.1)
##  TSP                    1.2-1     2022-07-14 [1] CRAN (R 4.2.0)
##  tzdb                   0.3.0     2022-03-28 [1] CRAN (R 4.2.0)
##  utf8                   1.2.2     2021-07-24 [1] CRAN (R 4.2.0)
##  vctrs                  0.5.1     2022-11-16 [1] CRAN (R 4.2.2)
##  viridis              * 0.6.2     2021-10-13 [1] CRAN (R 4.2.0)
##  viridisLite          * 0.4.1     2022-08-22 [1] CRAN (R 4.2.0)
##  vroom                  1.6.0     2022-09-30 [1] CRAN (R 4.2.0)
##  webshot                0.5.4     2022-09-26 [1] CRAN (R 4.2.1)
##  withr                  2.5.0     2022-03-03 [1] CRAN (R 4.2.0)
##  xfun                   0.35      2022-11-16 [1] CRAN (R 4.2.2)
##  XML                    3.99-0.13 2022-12-04 [1] CRAN (R 4.2.2)
##  xtable                 1.8-4     2019-04-21 [1] CRAN (R 4.2.0)
##  XVector                0.36.0    2022-04-26 [1] Bioconductor
##  yaml                   2.3.6     2022-10-18 [1] CRAN (R 4.2.1)
##  zlibbioc               1.42.0    2022-04-26 [1] Bioconductor
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
## 
## ──────────────────────────────────────────────────────────────────────────────